Main features
Visualization engine: The
gridsystem (same used byggplot2.)Layout algorithms: Default uses
igraph’s layout.Vertex sizes: Relative to the drawing area.
Network visualization (in R) with “netplot” and motif counting (in C++) with “barry”
SCI Seminar
Division of Epidemiology
University of Utah
2023-04-07
Research Assistant Professor of Epidemiology.
Ph.D. in Biostatistics from USC and M.Sc. in Economics from Caltech.
Methodologist working at the intersection between Statistical Computing and Complex Systems Modeling.
You can download the slides from
ggv.cl/slides/sci2023
What: An R package for network visualization inspired by Gephi.
Why: Opinionated way to visualize graphs.1
In the case of ggplot2 (and thus, ggraph)
While ggplot2 uses grid underneath it’s grammar API, these features are generally not directly available in ggplot2.
– Thomas Lin Pedersen, author ofggraph(source: tidyverse.org)
gggrid package does:The ‘ggplot2’ package does not yet have an interface for pattern fills, but the ‘gggrid’ package (Murrell, 2022) allows us to combine raw ‘grid’ output with the ‘ggplot2’ plot.
– Paul Murrel, author ofgrid(source: Vectorised Pattern Fills in R Graphics)
Visualization engine: The grid system (same used by ggplot2.)
Layout algorithms: Default uses igraph’s layout.
Vertex sizes: Relative to the drawing area.
The personal friendship network of a faculty of a UK university, consisting of 81 vertices (individuals) and 817 directed and weighted connections. The school affiliation of each individual is stored as a vertex attribute. This dataset can serve as a testbed for community detection algorithms.
Things to notice:
Vertex size autoscaled to the device size.
Edged colored mixing ego and alter (source+target.)
Edges change colors continuously (gradient.)
Vertices and edges’ sizes scale as required by the user.
Graphical objects (Grobs)
List of 11
$ .xlim : num [1:2] -1 1
$ .ylim : num [1:2] -0.5 0.5
$ .layout : num [1:81, 1:2] 0.6661 0.0201 0.7327 0.5399 -0.4903 ...
$ .edgelist : num [1:817, 1:2] 57 76 12 43 28 58 7 40 5 48 ...
$ .N : int 81
$ .M : int 817
$ name : chr "graph.3"
$ gp : NULL
$ vp : NULL
$ children :List of 2
..$ background:List of 10
.. ..- attr(*, "class")= chr [1:3] "rect" "grob" "gDesc"
..$ graph :List of 5
.. ..- attr(*, "class")= chr [1:3] "gTree" "grob" "gDesc"
..- attr(*, "class")= chr "gList"
$ childrenOrder: chr [1:2] "background" "graph"
- attr(*, "class")= chr [1:4] "netplot" "gTree" "grob" "gDesc"
netplot supports advanced patterns. The figures feature radial gradients (vertices), lineal gradients, and repeated patterns (background).
Speed up the code: grid objects can be computationally expensive to build.
Porter Bischof (Undergrad from UVU) will contribute and present at the INSNA Sunbelt conference (flagship conference of SNA).
What: A C++ header-only template library for motif counting (and more.)
Why: Implement Discrete Exponential Family Models [DEFMs] for phylogenetics and social networks analysis.
Where: You can get it on GitHub (USCBiostats/barry)
About 11 K lines of C++ code built for statistical modeling:
Motif count using change statistics (we will return to that.)
Full and constrained enumeration of 0/1 arrays.
Computes probability function for Discrete Exponential-Family Models [DEFMs].
Memory and computationally efficient for pooled models.
Change statistics are at the core of ERGMs (Exponential-Family Random Graph Models).
Two great applications:(i) make counting easy and (ii) can be used for sampling from ERGM likelihood function.
The change statistic is defined as a real-valued vector where the \(k\)-th entry equals the observed change when the \(ij\)-th tie is removed from the network. Formally:
\[ \delta(y_{ij}: 0\to 1) = s(\mathbf{y})_{ij}^+ - s(\mathbf{y})_{ij}^- \]
Where \(s(\cdot)\) is a function returning graph \(\mathbf{y}\)’s observed statistics, and \(s(\mathbf{y})_{ij}^+\) represents its value when \(y_{ij} = 1\).
\[\begin{equation} \mbox{logit}\left({\mathbb{P}\left(y_{ij} = 1|y_{-ij}\right) }\right) = {\theta}^\mathbf{t}\Delta\delta\left(y_{ij}:0\to 1\right), \end{equation}\]
with \(\delta\left(y_{ij}:0\to 1\right)\equiv s\left(\mathbf{y}\right)_{\mbox{ij}}^+ - s\left(\mathbf{y}\right)_{\mbox{ij}}^-\) as the vector of change statistics, in other words, the difference between the
\[\begin{equation} {\mathbb{P}\left(y_{ij} = 1|y_{-ij}\right) } = \frac{1}{1 + \mbox{exp}\left\{-{\theta}^\mathbf{t}\Delta\delta\left(y_{ij}:0\to 1\right)\right\}} \end{equation}\]
Let’s look into the change statistics edgecount, triangles, and gender-homophily when we remove tie 33-69.
| s() | y- | y+ | change |
|---|---|---|---|
| Edgecount | 816 | 817 | 1 |
| Triangles | 5366 | 5399 | 33 |
| Group-homophily | 664 | 665 | 1 |
Exponential-Family Random Graph Models [ERGMs].
DEFMs for multiple correlated outcomes (Markov Random Fields; on development with Drs. MJ Pugh and Tom Valente.)
Motif counting applied to counting “imaginary motifs” in Cognitive Social Structures [CSS] (with Dr. Kyosuke Tanaka, submitted to Social Networks).
Modeling the evolution of gene functions in terms of transition between functional states (research grant submitted to National Human Genome Research Institute NHGRI).
The netplot R package for graph visualization.
barry: Your go-to motif accountant.
fmcmc | ergmito | aphylo | netdiffuseR | ABCoptim
slurmR | barry | rgexf | rgexf
george.vegayon at utah – https://ggvy.cl